Overview

Dataset statistics

Number of variables26
Number of observations14825
Missing cells333
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.9 MiB
Average record size in memory208.0 B

Variable types

Numeric8
Categorical18

Alerts

Name has a high cardinality: 13660 distinct valuesHigh cardinality
NAICSDescr has a high cardinality: 654 distinct valuesHigh cardinality
Phone has a high cardinality: 14090 distinct valuesHigh cardinality
Fax has a high cardinality: 8296 distinct valuesHigh cardinality
TollFree has a high cardinality: 2149 distinct valuesHigh cardinality
EMail has a high cardinality: 9793 distinct valuesHigh cardinality
WebAddress has a high cardinality: 9717 distinct valuesHigh cardinality
StreetName has a high cardinality: 604 distinct valuesHigh cardinality
Address has a high cardinality: 5587 distinct valuesHigh cardinality
PostalCode has a high cardinality: 2689 distinct valuesHigh cardinality
BldgNo has a high cardinality: 61 distinct valuesHigh cardinality
UnitNo has a high cardinality: 1602 distinct valuesHigh cardinality
Modified has a high cardinality: 189 distinct valuesHigh cardinality
CHArea has a high cardinality: 56 distinct valuesHigh cardinality
FID is highly overall correlated with StreetNo and 7 other fieldsHigh correlation
NAICSCode is highly overall correlated with NAICSTitleHigh correlation
PIN is highly overall correlated with FID and 5 other fieldsHigh correlation
Ward is highly overall correlated with FID and 7 other fieldsHigh correlation
X is highly overall correlated with FID and 7 other fieldsHigh correlation
Y is highly overall correlated with FID and 7 other fieldsHigh correlation
NAICSTitle is highly overall correlated with NAICSCode and 1 other fieldsHigh correlation
CHArea is highly overall correlated with FID and 9 other fieldsHigh correlation
BIA_NAME is highly overall correlated with FID and 5 other fieldsHigh correlation
BIAFulName is highly overall correlated with FID and 5 other fieldsHigh correlation
StreetNo is highly overall correlated with FID and 5 other fieldsHigh correlation
BldgNo is highly overall correlated with CHAreaHigh correlation
EmplRange has 323 (2.2%) missing valuesMissing
FID is uniformly distributedUniform
FID has unique valuesUnique
ID has unique valuesUnique

Reproduction

Analysis started2023-01-31 22:39:24.712531
Analysis finished2023-01-31 22:39:49.409968
Duration24.7 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

FID
Real number (ℝ)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct14825
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7413
Minimum1
Maximum14825
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum1
5-th percentile742.2
Q13707
median7413
Q311119
95-th percentile14083.8
Maximum14825
Range14824
Interquartile range (IQR)7412

Descriptive statistics

Standard deviation4279.7532
Coefficient of variation (CV)0.5773308
Kurtosis-1.2
Mean7413
Median Absolute Deviation (MAD)3706
Skewness0
Sum1.0989772 × 108
Variance18316288
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
9875 1
 
< 0.1%
9877 1
 
< 0.1%
9878 1
 
< 0.1%
9879 1
 
< 0.1%
9880 1
 
< 0.1%
9881 1
 
< 0.1%
9882 1
 
< 0.1%
9883 1
 
< 0.1%
9884 1
 
< 0.1%
Other values (14815) 14815
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
14825 1
< 0.1%
14824 1
< 0.1%
14823 1
< 0.1%
14822 1
< 0.1%
14821 1
< 0.1%
14820 1
< 0.1%
14819 1
< 0.1%
14818 1
< 0.1%
14817 1
< 0.1%
14816 1
< 0.1%

ID
Real number (ℝ)

Distinct14825
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39096.578
Minimum7
Maximum94424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum7
5-th percentile2351.8
Q110541
median21076
Q358669
95-th percentile91774.6
Maximum94424
Range94417
Interquartile range (IQR)48128

Descriptive statistics

Standard deviation32104.758
Coefficient of variation (CV)0.82116542
Kurtosis-1.3113109
Mean39096.578
Median Absolute Deviation (MAD)19842
Skewness0.4507187
Sum5.7960677 × 108
Variance1.0307155 × 109
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
94017 1
 
< 0.1%
56549 1
 
< 0.1%
13532 1
 
< 0.1%
21126 1
 
< 0.1%
20882 1
 
< 0.1%
92536 1
 
< 0.1%
2892 1
 
< 0.1%
20867 1
 
< 0.1%
19285 1
 
< 0.1%
2153 1
 
< 0.1%
Other values (14815) 14815
99.9%
ValueCountFrequency (%)
7 1
< 0.1%
10 1
< 0.1%
16 1
< 0.1%
18 1
< 0.1%
20 1
< 0.1%
21 1
< 0.1%
23 1
< 0.1%
26 1
< 0.1%
35 1
< 0.1%
37 1
< 0.1%
ValueCountFrequency (%)
94424 1
< 0.1%
94423 1
< 0.1%
94419 1
< 0.1%
94371 1
< 0.1%
94321 1
< 0.1%
94319 1
< 0.1%
94318 1
< 0.1%
94317 1
< 0.1%
94313 1
< 0.1%
94293 1
< 0.1%

Name
Categorical

Distinct13660
Distinct (%)92.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
PLASP Child Care Centre
 
93
Tim Hortons
 
57
Subway
 
44
Petro Canada
 
23
Shoppers Drug Mart
 
22
Other values (13655)
14586 

Length

Max length111
Median length71
Mean length22.503541
Min length2

Characters and Unicode

Total characters333615
Distinct characters90
Distinct categories14 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13206 ?
Unique (%)89.1%

Sample

1st rowGarderie La Fontaine De I'Amitie
2nd rowKONE Canada Inc.
3rd rowBiesse Canada
4th rowTrimart Corporation
5th rowS A W Technology

Common Values

ValueCountFrequency (%)
PLASP Child Care Centre 93
 
0.6%
Tim Hortons 57
 
0.4%
Subway 44
 
0.3%
Petro Canada 23
 
0.2%
Shoppers Drug Mart 22
 
0.1%
Dollarama 18
 
0.1%
Starbucks 18
 
0.1%
Shell Canada 18
 
0.1%
Royal Bank of Canada 16
 
0.1%
Edward Jones Investments 15
 
0.1%
Other values (13650) 14501
97.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
inc 2981
 
5.7%
1705
 
3.3%
ltd 1548
 
3.0%
canada 895
 
1.7%
centre 645
 
1.2%
and 483
 
0.9%
services 456
 
0.9%
the 435
 
0.8%
corp 391
 
0.7%
of 384
 
0.7%
Other values (12001) 42343
81.0%

Most occurring characters

ValueCountFrequency (%)
37495
 
11.2%
e 25105
 
7.5%
a 24267
 
7.3%
n 21673
 
6.5%
i 19305
 
5.8%
r 19168
 
5.7%
o 18176
 
5.4%
t 17740
 
5.3%
s 14612
 
4.4%
l 11980
 
3.6%
Other values (80) 124094
37.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 232831
69.8%
Uppercase Letter 52759
 
15.8%
Space Separator 37495
 
11.2%
Other Punctuation 8482
 
2.5%
Dash Punctuation 773
 
0.2%
Decimal Number 746
 
0.2%
Close Punctuation 216
 
0.1%
Open Punctuation 216
 
0.1%
Final Punctuation 55
 
< 0.1%
Math Symbol 36
 
< 0.1%
Other values (4) 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 25105
10.8%
a 24267
10.4%
n 21673
9.3%
i 19305
 
8.3%
r 19168
 
8.2%
o 18176
 
7.8%
t 17740
 
7.6%
s 14612
 
6.3%
l 11980
 
5.1%
c 11312
 
4.9%
Other values (20) 49493
21.3%
Uppercase Letter
ValueCountFrequency (%)
C 7034
13.3%
S 5524
 
10.5%
I 4527
 
8.6%
M 3442
 
6.5%
L 3439
 
6.5%
P 3374
 
6.4%
A 3346
 
6.3%
T 2945
 
5.6%
D 2433
 
4.6%
B 2119
 
4.0%
Other values (17) 14576
27.6%
Other Punctuation
ValueCountFrequency (%)
. 5839
68.8%
& 1344
 
15.8%
, 568
 
6.7%
' 517
 
6.1%
/ 174
 
2.1%
: 18
 
0.2%
@ 6
 
0.1%
# 5
 
0.1%
; 4
 
< 0.1%
! 4
 
< 0.1%
Other values (2) 3
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 159
21.3%
2 132
17.7%
0 128
17.2%
4 79
10.6%
3 50
 
6.7%
9 49
 
6.6%
8 42
 
5.6%
6 36
 
4.8%
7 36
 
4.8%
5 35
 
4.7%
Math Symbol
ValueCountFrequency (%)
+ 32
88.9%
| 4
 
11.1%
Space Separator
ValueCountFrequency (%)
37495
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 773
100.0%
Close Punctuation
ValueCountFrequency (%)
) 216
100.0%
Open Punctuation
ValueCountFrequency (%)
( 216
100.0%
Final Punctuation
ValueCountFrequency (%)
55
100.0%
Format
ValueCountFrequency (%)
3
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 285590
85.6%
Common 48025
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 25105
 
8.8%
a 24267
 
8.5%
n 21673
 
7.6%
i 19305
 
6.8%
r 19168
 
6.7%
o 18176
 
6.4%
t 17740
 
6.2%
s 14612
 
5.1%
l 11980
 
4.2%
c 11312
 
4.0%
Other values (47) 102252
35.8%
Common
ValueCountFrequency (%)
37495
78.1%
. 5839
 
12.2%
& 1344
 
2.8%
- 773
 
1.6%
, 568
 
1.2%
' 517
 
1.1%
) 216
 
0.4%
( 216
 
0.4%
/ 174
 
0.4%
1 159
 
0.3%
Other values (23) 724
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 333545
> 99.9%
Punctuation 58
 
< 0.1%
None 12
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37495
 
11.2%
e 25105
 
7.5%
a 24267
 
7.3%
n 21673
 
6.5%
i 19305
 
5.8%
r 19168
 
5.7%
o 18176
 
5.4%
t 17740
 
5.3%
s 14612
 
4.4%
l 11980
 
3.6%
Other values (73) 124024
37.2%
Punctuation
ValueCountFrequency (%)
55
94.8%
3
 
5.2%
None
ValueCountFrequency (%)
é 6
50.0%
ü 2
 
16.7%
ē 2
 
16.7%
ä 1
 
8.3%
É 1
 
8.3%

EmplRange
Categorical

Distinct9
Distinct (%)0.1%
Missing323
Missing (%)2.2%
Memory size115.9 KiB
1 to 4
6562 
5 to 9
3080 
10 to 19
2048 
20 to 49
1556 
50 to 99
699 
Other values (4)
 
557

Length

Max length10
Median length6
Mean length6.7455523
Min length6

Characters and Unicode

Total characters97824
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5 to 9
2nd row5 to 9
3rd row10 to 19
4th row5 to 9
5th row1 to 4

Common Values

ValueCountFrequency (%)
1 to 4 6562
44.3%
5 to 9 3080
20.8%
10 to 19 2048
 
13.8%
20 to 49 1556
 
10.5%
50 to 99 699
 
4.7%
100 to 299 429
 
2.9%
300 to 499 73
 
0.5%
500 to 999 33
 
0.2%
1000 plus 22
 
0.1%
(Missing) 323
 
2.2%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
to 14480
33.3%
1 6562
15.1%
4 6562
15.1%
5 3080
 
7.1%
9 3080
 
7.1%
10 2048
 
4.7%
19 2048
 
4.7%
20 1556
 
3.6%
49 1556
 
3.6%
99 699
 
1.6%
Other values (9) 1813
 
4.2%

Most occurring characters

ValueCountFrequency (%)
28982
29.6%
t 14480
14.8%
o 14480
14.8%
1 11109
 
11.4%
9 9185
 
9.4%
4 8191
 
8.4%
0 5439
 
5.6%
5 3812
 
3.9%
2 1985
 
2.0%
3 73
 
0.1%
Other values (4) 88
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 39794
40.7%
Lowercase Letter 29048
29.7%
Space Separator 28982
29.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 11109
27.9%
9 9185
23.1%
4 8191
20.6%
0 5439
13.7%
5 3812
 
9.6%
2 1985
 
5.0%
3 73
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
t 14480
49.8%
o 14480
49.8%
p 22
 
0.1%
l 22
 
0.1%
u 22
 
0.1%
s 22
 
0.1%
Space Separator
ValueCountFrequency (%)
28982
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 68776
70.3%
Latin 29048
29.7%

Most frequent character per script

Common
ValueCountFrequency (%)
28982
42.1%
1 11109
 
16.2%
9 9185
 
13.4%
4 8191
 
11.9%
0 5439
 
7.9%
5 3812
 
5.5%
2 1985
 
2.9%
3 73
 
0.1%
Latin
ValueCountFrequency (%)
t 14480
49.8%
o 14480
49.8%
p 22
 
0.1%
l 22
 
0.1%
u 22
 
0.1%
s 22
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 97824
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28982
29.6%
t 14480
14.8%
o 14480
14.8%
1 11109
 
11.4%
9 9185
 
9.4%
4 8191
 
8.4%
0 5439
 
5.6%
5 3812
 
3.9%
2 1985
 
2.0%
3 73
 
0.1%
Other values (4) 88
 
0.1%

NAICSTitle
Categorical

Distinct20
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
Retail
2074 
Manufacturing
1779 
Other Services
1703 
Wholesale
1528 
Professional
1330 
Other values (15)
6411 

Length

Max length21
Median length14
Mean length11.02172
Min length1

Characters and Unicode

Total characters163397
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHealth Care
2nd rowManufacturing
3rd rowManufacturing
4th rowFinance
5th rowWholesale

Common Values

ValueCountFrequency (%)
Retail 2074
14.0%
Manufacturing 1779
12.0%
Other Services 1703
11.5%
Wholesale 1528
10.3%
Professional 1330
9.0%
Health Care 1287
8.7%
Accommodation 1230
8.3%
Transportation 728
 
4.9%
Finance 604
 
4.1%
Educational 586
 
4.0%
Other values (10) 1976
13.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
retail 2074
11.3%
manufacturing 1779
9.7%
other 1703
9.3%
services 1703
9.3%
wholesale 1528
8.4%
professional 1330
 
7.3%
health 1287
 
7.0%
care 1287
 
7.0%
accommodation 1230
 
6.7%
transportation 728
 
4.0%
Other values (14) 3643
19.9%

Most occurring characters

ValueCountFrequency (%)
a 17201
 
10.5%
e 16192
 
9.9%
t 13619
 
8.3%
i 12667
 
7.8%
n 11639
 
7.1%
o 11392
 
7.0%
r 10759
 
6.6%
l 8823
 
5.4%
s 8358
 
5.1%
c 7784
 
4.8%
Other values (25) 44963
27.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 141632
86.7%
Uppercase Letter 18292
 
11.2%
Space Separator 3473
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 17201
12.1%
e 16192
11.4%
t 13619
9.6%
i 12667
8.9%
n 11639
8.2%
o 11392
8.0%
r 10759
7.6%
l 8823
 
6.2%
s 8358
 
5.9%
c 7784
 
5.5%
Other values (10) 23198
16.4%
Uppercase Letter
ValueCountFrequency (%)
R 2444
13.4%
A 2029
11.1%
M 1877
10.3%
C 1835
10.0%
S 1703
9.3%
O 1703
9.3%
W 1528
8.4%
P 1440
7.9%
H 1287
7.0%
E 956
 
5.2%
Other values (4) 1490
8.1%
Space Separator
ValueCountFrequency (%)
3473
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 159924
97.9%
Common 3473
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 17201
10.8%
e 16192
 
10.1%
t 13619
 
8.5%
i 12667
 
7.9%
n 11639
 
7.3%
o 11392
 
7.1%
r 10759
 
6.7%
l 8823
 
5.5%
s 8358
 
5.2%
c 7784
 
4.9%
Other values (24) 41490
25.9%
Common
ValueCountFrequency (%)
3473
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 163397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 17201
 
10.5%
e 16192
 
9.9%
t 13619
 
8.3%
i 12667
 
7.8%
n 11639
 
7.1%
o 11392
 
7.0%
r 10759
 
6.6%
l 8823
 
5.4%
s 8358
 
5.1%
c 7784
 
4.8%
Other values (25) 44963
27.5%

NAICSDescr
Categorical

Distinct654
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
Limited-service eating places
 
760
General Automotive Repair
 
382
Full-service restaurants
 
335
Offices of Dentists
 
302
Offices of Lawyers
 
267
Other values (649)
12779 

Length

Max length175
Median length70
Mean length36.120337
Min length6

Characters and Unicode

Total characters535484
Distinct characters60
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique89 ?
Unique (%)0.6%

Sample

1st rowChild Day-Care Services
2nd rowOther Metalworking Machinery Manufacturing
3rd rowSawmill and woodworking machinery manufacturing  
4th rowMortgage and Non-mortgage Loan Brokers
5th rowAll Other Machinery, Equipment and Supplies Wholesaler-Distributors

Common Values

ValueCountFrequency (%)
Limited-service eating places 760
 
5.1%
General Automotive Repair 382
 
2.6%
Full-service restaurants 335
 
2.3%
Offices of Dentists 302
 
2.0%
Offices of Lawyers 267
 
1.8%
Beauty Salons 250
 
1.7%
Offices of Physicians 250
 
1.7%
Elementary and Secondary Schools 234
 
1.6%
Other Freight Transportation Arrangement 231
 
1.6%
Religious Organizations 227
 
1.5%
Other values (644) 11587
78.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
and 6409
 
9.9%
other 3539
 
5.5%
stores 1775
 
2.8%
services 1769
 
2.7%
offices 1649
 
2.6%
all 1636
 
2.5%
of 1594
 
2.5%
wholesaler-distributors 1458
 
2.3%
manufacturing 1334
 
2.1%
supplies 851
 
1.3%
Other values (908) 42495
65.9%

Most occurring characters

ValueCountFrequency (%)
e 54598
 
10.2%
49952
 
9.3%
i 38404
 
7.2%
r 36620
 
6.8%
t 35330
 
6.6%
a 35195
 
6.6%
n 34988
 
6.5%
s 31283
 
5.8%
o 26659
 
5.0%
l 22311
 
4.2%
Other values (50) 170144
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 425141
79.4%
Uppercase Letter 52834
 
9.9%
Space Separator 50119
 
9.4%
Dash Punctuation 3582
 
0.7%
Other Punctuation 2052
 
0.4%
Close Punctuation 823
 
0.2%
Open Punctuation 823
 
0.2%
Control 110
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 54598
12.8%
i 38404
9.0%
r 36620
8.6%
t 35330
 
8.3%
a 35195
 
8.3%
n 34988
 
8.2%
s 31283
 
7.4%
o 26659
 
6.3%
l 22311
 
5.2%
c 20540
 
4.8%
Other values (16) 89213
21.0%
Uppercase Letter
ValueCountFrequency (%)
S 7461
14.1%
O 5806
11.0%
A 4831
 
9.1%
C 4686
 
8.9%
M 4099
 
7.8%
P 3537
 
6.7%
D 2915
 
5.5%
W 2317
 
4.4%
E 2183
 
4.1%
L 2165
 
4.1%
Other values (14) 12834
24.3%
Other Punctuation
ValueCountFrequency (%)
, 1779
86.7%
' 132
 
6.4%
& 93
 
4.5%
. 48
 
2.3%
Space Separator
ValueCountFrequency (%)
49952
99.7%
  167
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 3582
100.0%
Close Punctuation
ValueCountFrequency (%)
) 823
100.0%
Open Punctuation
ValueCountFrequency (%)
( 823
100.0%
Control
ValueCountFrequency (%)
110
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 477975
89.3%
Common 57509
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 54598
 
11.4%
i 38404
 
8.0%
r 36620
 
7.7%
t 35330
 
7.4%
a 35195
 
7.4%
n 34988
 
7.3%
s 31283
 
6.5%
o 26659
 
5.6%
l 22311
 
4.7%
c 20540
 
4.3%
Other values (40) 142047
29.7%
Common
ValueCountFrequency (%)
49952
86.9%
- 3582
 
6.2%
, 1779
 
3.1%
) 823
 
1.4%
( 823
 
1.4%
  167
 
0.3%
' 132
 
0.2%
110
 
0.2%
& 93
 
0.2%
. 48
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 535317
> 99.9%
None 167
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 54598
 
10.2%
49952
 
9.3%
i 38404
 
7.2%
r 36620
 
6.8%
t 35330
 
6.6%
a 35195
 
6.6%
n 34988
 
6.5%
s 31283
 
5.8%
o 26659
 
5.0%
l 22311
 
4.2%
Other values (49) 169977
31.8%
None
ValueCountFrequency (%)
  167
100.0%

NAICSCode
Real number (ℝ)

Distinct654
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean535639.94
Minimum1
Maximum913910
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum1
5-th percentile313214
Q1418410
median524299
Q3621390
95-th percentile812116
Maximum913910
Range913909
Interquartile range (IQR)202980

Descriptive statistics

Standard deviation159137.93
Coefficient of variation (CV)0.29709869
Kurtosis-0.64746315
Mean535639.94
Median Absolute Deviation (MAD)97211
Skewness0.27043917
Sum7.9408622 × 109
Variance2.532488 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
722512 760
 
5.1%
811111 382
 
2.6%
722511 335
 
2.3%
621210 302
 
2.0%
541110 267
 
1.8%
812115 250
 
1.7%
621110 250
 
1.7%
611110 234
 
1.6%
488519 231
 
1.6%
813110 227
 
1.5%
Other values (644) 11587
78.2%
ValueCountFrequency (%)
1 3
 
< 0.1%
112999 1
 
< 0.1%
115110 1
 
< 0.1%
212299 2
 
< 0.1%
213119 2
 
< 0.1%
221119 2
 
< 0.1%
221122 7
 
< 0.1%
221210 2
 
< 0.1%
221310 5
 
< 0.1%
236110 76
0.5%
ValueCountFrequency (%)
913910 20
0.1%
913140 21
0.1%
913130 1
 
< 0.1%
912910 8
 
0.1%
912210 5
 
< 0.1%
912190 3
 
< 0.1%
912130 1
 
< 0.1%
911910 7
 
< 0.1%
911410 1
 
< 0.1%
911320 35
0.2%

Phone
Categorical

Distinct14090
Distinct (%)95.0%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
 
415
905-615-3200
 
11
905-624-3811
 
7
905-615-3777
 
5
905-896-0210
 
5
Other values (14085)
14382 

Length

Max length20
Median length12
Mean length11.691939
Min length1

Characters and Unicode

Total characters173333
Distinct characters16
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13812 ?
Unique (%)93.2%

Sample

1st row905-822-8902
2nd row905-820-6034
3rd row416-525-9110
4th row905-820-6711
5th row905-567-1804

Common Values

ValueCountFrequency (%)
415
 
2.8%
905-615-3200 11
 
0.1%
905-624-3811 7
 
< 0.1%
905-615-3777 5
 
< 0.1%
905-896-0210 5
 
< 0.1%
905-785-8928 4
 
< 0.1%
647-484-4372 4
 
< 0.1%
905-615-4750 4
 
< 0.1%
905-949-2222 4
 
< 0.1%
905-615-4640 4
 
< 0.1%
Other values (14080) 14362
96.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
905-615-3200 11
 
0.1%
905-624-3811 7
 
< 0.1%
905-615-3777 5
 
< 0.1%
905-896-0210 5
 
< 0.1%
905-785-8928 4
 
< 0.1%
647-484-4372 4
 
< 0.1%
905-615-4750 4
 
< 0.1%
905-949-2222 4
 
< 0.1%
905-615-4640 4
 
< 0.1%
905-567-4032 3
 
< 0.1%
Other values (14083) 14364
99.6%

Most occurring characters

ValueCountFrequency (%)
- 28804
16.6%
0 25489
14.7%
5 22011
12.7%
9 21510
12.4%
6 13503
7.8%
2 13310
7.7%
7 11498
 
6.6%
8 11303
 
6.5%
1 9183
 
5.3%
4 9051
 
5.2%
Other values (6) 7671
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 144104
83.1%
Dash Punctuation 28806
 
16.6%
Space Separator 420
 
0.2%
Lowercase Letter 2
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 25489
17.7%
5 22011
15.3%
9 21510
14.9%
6 13503
9.4%
2 13310
9.2%
7 11498
8.0%
8 11303
7.8%
1 9183
 
6.4%
4 9051
 
6.3%
3 7246
 
5.0%
Dash Punctuation
ValueCountFrequency (%)
- 28804
> 99.9%
2
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
x 1
50.0%
t 1
50.0%
Space Separator
ValueCountFrequency (%)
420
100.0%
Uppercase Letter
ValueCountFrequency (%)
E 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 173330
> 99.9%
Latin 3
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
- 28804
16.6%
0 25489
14.7%
5 22011
12.7%
9 21510
12.4%
6 13503
7.8%
2 13310
7.7%
7 11498
 
6.6%
8 11303
 
6.5%
1 9183
 
5.3%
4 9051
 
5.2%
Other values (3) 7668
 
4.4%
Latin
ValueCountFrequency (%)
E 1
33.3%
x 1
33.3%
t 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 173331
> 99.9%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 28804
16.6%
0 25489
14.7%
5 22011
12.7%
9 21510
12.4%
6 13503
7.8%
2 13310
7.7%
7 11498
 
6.6%
8 11303
 
6.5%
1 9183
 
5.3%
4 9051
 
5.2%
Other values (5) 7669
 
4.4%
Punctuation
ValueCountFrequency (%)
2
100.0%

Fax
Categorical

Distinct8296
Distinct (%)56.0%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
6333 
905-822-2673
 
8
905-361-6401
 
8
905-502-6982
 
5
905-896-9380
 
5
Other values (8291)
8466 

Length

Max length14
Median length12
Mean length7.3377403
Min length1

Characters and Unicode

Total characters108782
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8127 ?
Unique (%)54.8%

Sample

1st row
2nd row905-820-7189
3rd row450-477-0484
4th row905-820-5669
5th row

Common Values

ValueCountFrequency (%)
6333
42.7%
905-822-2673 8
 
0.1%
905-361-6401 8
 
0.1%
905-502-6982 5
 
< 0.1%
905-896-9380 5
 
< 0.1%
1-855-552-7329 4
 
< 0.1%
905-625-4815 4
 
< 0.1%
905-819-1331 3
 
< 0.1%
1-888-550-6922 3
 
< 0.1%
905-403-8409 3
 
< 0.1%
Other values (8286) 8449
57.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
905-822-2673 8
 
0.1%
905-361-6401 8
 
0.1%
905-502-6982 5
 
0.1%
905-896-9380 5
 
0.1%
1-855-552-7329 4
 
< 0.1%
905-625-4815 4
 
< 0.1%
905-828-0617 3
 
< 0.1%
905-625-8815 3
 
< 0.1%
905-542-0987 3
 
< 0.1%
905-306-7542 3
 
< 0.1%
Other values (8286) 8447
99.5%

Most occurring characters

ValueCountFrequency (%)
- 17255
15.9%
0 13957
12.8%
5 13647
12.5%
9 13120
12.1%
6 8331
7.7%
2 7727
7.1%
8 6925
6.4%
7 6625
 
6.1%
6334
 
5.8%
1 5498
 
5.1%
Other values (2) 9363
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 85193
78.3%
Dash Punctuation 17255
 
15.9%
Space Separator 6334
 
5.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 13957
16.4%
5 13647
16.0%
9 13120
15.4%
6 8331
9.8%
2 7727
9.1%
8 6925
8.1%
7 6625
7.8%
1 5498
 
6.5%
4 4888
 
5.7%
3 4475
 
5.3%
Dash Punctuation
ValueCountFrequency (%)
- 17255
100.0%
Space Separator
ValueCountFrequency (%)
6334
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 108782
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 17255
15.9%
0 13957
12.8%
5 13647
12.5%
9 13120
12.1%
6 8331
7.7%
2 7727
7.1%
8 6925
6.4%
7 6625
 
6.1%
6334
 
5.8%
1 5498
 
5.1%
Other values (2) 9363
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 108782
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 17255
15.9%
0 13957
12.8%
5 13647
12.5%
9 13120
12.1%
6 8331
7.7%
2 7727
7.1%
8 6925
6.4%
7 6625
 
6.1%
6334
 
5.8%
1 5498
 
5.1%
Other values (2) 9363
8.6%

TollFree
Categorical

Distinct2149
Distinct (%)14.5%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
12606 
1-800-465-2422
 
8
1-800-769-2511
 
7
1-800-472-6842
 
5
1-877-849-3637
 
4
Other values (2144)
2195 

Length

Max length14
Median length1
Mean length2.9458347
Min length1

Characters and Unicode

Total characters43672
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2098 ?
Unique (%)14.2%

Sample

1st row
2nd row
3rd row1-800-598-3202
4th row
5th row

Common Values

ValueCountFrequency (%)
12606
85.0%
1-800-465-2422 8
 
0.1%
1-800-769-2511 7
 
< 0.1%
1-800-472-6842 5
 
< 0.1%
1-877-849-3637 4
 
< 0.1%
1-855-552-7467 4
 
< 0.1%
1-866-607-6301 3
 
< 0.1%
1-888-944-6539 3
 
< 0.1%
1-877-777-8672 3
 
< 0.1%
1-888-571-2627 2
 
< 0.1%
Other values (2139) 2180
 
14.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1-800-465-2422 8
 
0.4%
1-800-769-2511 7
 
0.3%
1-800-472-6842 5
 
0.2%
1-877-849-3637 4
 
0.2%
1-855-552-7467 4
 
0.2%
1-866-607-6301 3
 
0.1%
1-888-944-6539 3
 
0.1%
1-877-777-8672 3
 
0.1%
1-855-696-7227 2
 
0.1%
1-844-593-9707 2
 
0.1%
Other values (2143) 2183
98.2%

Most occurring characters

ValueCountFrequency (%)
12611
28.9%
- 6651
15.2%
8 4677
 
10.7%
1 3345
 
7.7%
6 2833
 
6.5%
0 2739
 
6.3%
7 2471
 
5.7%
2 1925
 
4.4%
5 1898
 
4.3%
3 1710
 
3.9%
Other values (3) 2812
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 24409
55.9%
Space Separator 12611
28.9%
Dash Punctuation 6652
 
15.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 4677
19.2%
1 3345
13.7%
6 2833
11.6%
0 2739
11.2%
7 2471
10.1%
2 1925
7.9%
5 1898
7.8%
3 1710
 
7.0%
4 1548
 
6.3%
9 1263
 
5.2%
Dash Punctuation
ValueCountFrequency (%)
- 6651
> 99.9%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
12611
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 43672
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12611
28.9%
- 6651
15.2%
8 4677
 
10.7%
1 3345
 
7.7%
6 2833
 
6.5%
0 2739
 
6.3%
7 2471
 
5.7%
2 1925
 
4.4%
5 1898
 
4.3%
3 1710
 
3.9%
Other values (3) 2812
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43671
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12611
28.9%
- 6651
15.2%
8 4677
 
10.7%
1 3345
 
7.7%
6 2833
 
6.5%
0 2739
 
6.3%
7 2471
 
5.7%
2 1925
 
4.4%
5 1898
 
4.3%
3 1710
 
3.9%
Other values (2) 2811
 
6.4%
Punctuation
ValueCountFrequency (%)
1
100.0%

EMail
Categorical

Distinct9793
Distinct (%)66.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
4904 
info@taxwide.com
 
5
insure@all-risks.com
 
4
info@mississaugaschoolofmusic.ca
 
3
info@publicstoragecanada.com
 
3
Other values (9788)
9906 

Length

Max length71
Median length50
Mean length15.419831
Min length1

Characters and Unicode

Total characters228599
Distinct characters73
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9670 ?
Unique (%)65.2%

Sample

1st rowcontact@lafontaindeamitie.ca
2nd rowkoneservice@kone.com
3rd rowmatt.fleming@biessecanada.com
4th rowPriority@trimart.ca
5th rowmark@sawtechnology.com

Common Values

ValueCountFrequency (%)
4904
33.1%
info@taxwide.com 5
 
< 0.1%
insure@all-risks.com 4
 
< 0.1%
info@mississaugaschoolofmusic.ca 3
 
< 0.1%
info@publicstoragecanada.com 3
 
< 0.1%
chaseautogta@gmail.com 2
 
< 0.1%
inquiry@frendel.com 2
 
< 0.1%
info@oscarservice.com 2
 
< 0.1%
pharmahealth@pharmahealth.ca 2
 
< 0.1%
info@karachikitchen.com 2
 
< 0.1%
Other values (9783) 9896
66.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
info@taxwide.com 5
 
0.1%
insure@all-risks.com 4
 
< 0.1%
info@mississaugaschoolofmusic.ca 3
 
< 0.1%
info@publicstoragecanada.com 3
 
< 0.1%
bobchambersltd@bellnet.ca 2
 
< 0.1%
info@greatmountainginseng.com 2
 
< 0.1%
westport@westportfrt.com 2
 
< 0.1%
swil@adworksmailing.com 2
 
< 0.1%
info@letreport.com 2
 
< 0.1%
lindalaakso@customorthotic.ca 2
 
< 0.1%
Other values (9787) 9913
99.7%

Most occurring characters

ValueCountFrequency (%)
a 20601
 
9.0%
o 20540
 
9.0%
c 17366
 
7.6%
i 15559
 
6.8%
e 14868
 
6.5%
m 13466
 
5.9%
n 13088
 
5.7%
s 12021
 
5.3%
r 10961
 
4.8%
. 10838
 
4.7%
Other values (63) 79291
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 198805
87.0%
Other Punctuation 20770
 
9.1%
Space Separator 4958
 
2.2%
Decimal Number 2601
 
1.1%
Uppercase Letter 935
 
0.4%
Dash Punctuation 374
 
0.2%
Connector Punctuation 155
 
0.1%
Final Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 20601
10.4%
o 20540
10.3%
c 17366
 
8.7%
i 15559
 
7.8%
e 14868
 
7.5%
m 13466
 
6.8%
n 13088
 
6.6%
s 12021
 
6.0%
r 10961
 
5.5%
t 10400
 
5.2%
Other values (16) 49935
25.1%
Uppercase Letter
ValueCountFrequency (%)
I 166
17.8%
S 112
12.0%
M 102
10.9%
A 77
 
8.2%
C 58
 
6.2%
D 47
 
5.0%
P 39
 
4.2%
B 39
 
4.2%
R 37
 
4.0%
J 34
 
3.6%
Other values (16) 224
24.0%
Decimal Number
ValueCountFrequency (%)
1 484
18.6%
0 422
16.2%
2 389
15.0%
3 222
8.5%
4 205
7.9%
5 196
7.5%
7 177
 
6.8%
8 173
 
6.7%
6 170
 
6.5%
9 163
 
6.3%
Other Punctuation
ValueCountFrequency (%)
. 10838
52.2%
@ 9915
47.7%
& 6
 
< 0.1%
/ 5
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
: 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4958
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 374
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 155
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 199740
87.4%
Common 28859
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 20601
10.3%
o 20540
10.3%
c 17366
 
8.7%
i 15559
 
7.8%
e 14868
 
7.4%
m 13466
 
6.7%
n 13088
 
6.6%
s 12021
 
6.0%
r 10961
 
5.5%
t 10400
 
5.2%
Other values (42) 50870
25.5%
Common
ValueCountFrequency (%)
. 10838
37.6%
@ 9915
34.4%
4958
17.2%
1 484
 
1.7%
0 422
 
1.5%
2 389
 
1.3%
- 374
 
1.3%
3 222
 
0.8%
4 205
 
0.7%
5 196
 
0.7%
Other values (11) 856
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 228598
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 20601
 
9.0%
o 20540
 
9.0%
c 17366
 
7.6%
i 15559
 
6.8%
e 14868
 
6.5%
m 13466
 
5.9%
n 13088
 
5.7%
s 12021
 
5.3%
r 10961
 
4.8%
. 10838
 
4.7%
Other values (62) 79290
34.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

WebAddress
Categorical

Distinct9717
Distinct (%)65.5%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
3853 
www.timhortons.com
 
45
www.dpcdsb.org
 
43
www.subway.com
 
43
www.petro-canada.ca
 
22
Other values (9712)
10819 

Length

Max length50
Median length43
Mean length14.746712
Min length1

Characters and Unicode

Total characters218620
Distinct characters69
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9157 ?
Unique (%)61.8%

Sample

1st rowwww.lafontaindeamitie.ca
2nd rowwww.kone.ca
3rd rowwww.biessecanada.com
4th rowwww.trimart.ca
5th rowwww.sawtechnology.com

Common Values

ValueCountFrequency (%)
3853
 
26.0%
www.timhortons.com 45
 
0.3%
www.dpcdsb.org 43
 
0.3%
www.subway.com 43
 
0.3%
www.petro-canada.ca 22
 
0.1%
www.shoppersdrugmart.ca 21
 
0.1%
www.mississauga.ca/portal/residents/fire 19
 
0.1%
www.shell.ca 18
 
0.1%
www.starbucks.ca 17
 
0.1%
www.dollarama.com 16
 
0.1%
Other values (9707) 10728
72.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
www.timhortons.com 45
 
0.4%
www.subway.com 43
 
0.4%
www.dpcdsb.org 43
 
0.4%
www.petro-canada.ca 22
 
0.2%
www.shoppersdrugmart.ca 21
 
0.2%
www.mississauga.ca/portal/residents/fire 19
 
0.2%
www.shell.ca 18
 
0.2%
www.starbucks.ca 17
 
0.2%
www.dollarama.com 16
 
0.1%
www.edwardjones.com 15
 
0.1%
Other values (9704) 10721
97.6%

Most occurring characters

ValueCountFrequency (%)
w 34475
15.8%
. 22162
 
10.1%
c 17377
 
7.9%
a 17013
 
7.8%
o 15585
 
7.1%
e 12604
 
5.8%
m 10716
 
4.9%
s 9797
 
4.5%
i 9771
 
4.5%
r 9639
 
4.4%
Other values (59) 59481
27.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 191232
87.5%
Other Punctuation 22333
 
10.2%
Space Separator 3860
 
1.8%
Dash Punctuation 508
 
0.2%
Decimal Number 473
 
0.2%
Uppercase Letter 201
 
0.1%
Math Symbol 9
 
< 0.1%
Control 2
 
< 0.1%
Modifier Symbol 1
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w 34475
18.0%
c 17377
 
9.1%
a 17013
 
8.9%
o 15585
 
8.1%
e 12604
 
6.6%
m 10716
 
5.6%
s 9797
 
5.1%
i 9771
 
5.1%
r 9639
 
5.0%
t 9156
 
4.8%
Other values (16) 45099
23.6%
Uppercase Letter
ValueCountFrequency (%)
W 30
14.9%
C 21
 
10.4%
M 13
 
6.5%
S 13
 
6.5%
R 12
 
6.0%
A 11
 
5.5%
P 10
 
5.0%
F 10
 
5.0%
I 10
 
5.0%
L 10
 
5.0%
Other values (13) 61
30.3%
Decimal Number
ValueCountFrequency (%)
1 102
21.6%
2 86
18.2%
0 74
15.6%
4 63
13.3%
3 39
 
8.2%
6 27
 
5.7%
9 23
 
4.9%
5 22
 
4.7%
8 22
 
4.7%
7 15
 
3.2%
Other Punctuation
ValueCountFrequency (%)
. 22162
99.2%
/ 165
 
0.7%
& 4
 
< 0.1%
\ 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
3860
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 508
100.0%
Math Symbol
ValueCountFrequency (%)
~ 9
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 191433
87.6%
Common 27187
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 34475
18.0%
c 17377
 
9.1%
a 17013
 
8.9%
o 15585
 
8.1%
e 12604
 
6.6%
m 10716
 
5.6%
s 9797
 
5.1%
i 9771
 
5.1%
r 9639
 
5.0%
t 9156
 
4.8%
Other values (39) 45300
23.7%
Common
ValueCountFrequency (%)
. 22162
81.5%
3860
 
14.2%
- 508
 
1.9%
/ 165
 
0.6%
1 102
 
0.4%
2 86
 
0.3%
0 74
 
0.3%
4 63
 
0.2%
3 39
 
0.1%
6 27
 
0.1%
Other values (10) 101
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 218620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
w 34475
15.8%
. 22162
 
10.1%
c 17377
 
7.9%
a 17013
 
7.8%
o 15585
 
7.1%
e 12604
 
5.8%
m 10716
 
4.9%
s 9797
 
4.5%
i 9771
 
4.5%
r 9639
 
4.4%
Other values (59) 59481
27.2%

StreetNo
Real number (ℝ)

Distinct2795
Distinct (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2965.1375
Minimum1
Maximum7895
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum1
5-th percentile56
Q11050
median2399
Q35155
95-th percentile7071
Maximum7895
Range7894
Interquartile range (IQR)4105

Descriptive statistics

Standard deviation2370.3055
Coefficient of variation (CV)0.7993914
Kurtosis-1.0780181
Mean2965.1375
Median Absolute Deviation (MAD)1680
Skewness0.51402614
Sum43958164
Variance5618347.9
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 231
 
1.6%
5100 111
 
0.7%
7205 106
 
0.7%
1 89
 
0.6%
1250 79
 
0.5%
1550 72
 
0.5%
2425 64
 
0.4%
50 59
 
0.4%
4141 56
 
0.4%
2355 54
 
0.4%
Other values (2785) 13904
93.8%
ValueCountFrequency (%)
1 89
0.6%
2 40
0.3%
3 34
 
0.2%
4 30
 
0.2%
5 1
 
< 0.1%
6 7
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 4
 
< 0.1%
10 26
 
0.2%
ValueCountFrequency (%)
7895 28
0.2%
7890 2
 
< 0.1%
7885 15
0.1%
7880 1
 
< 0.1%
7875 7
 
< 0.1%
7860 1
 
< 0.1%
7855 1
 
< 0.1%
7840 1
 
< 0.1%
7830 1
 
< 0.1%
7825 1
 
< 0.1%

StreetName
Categorical

Distinct604
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
Dundas St E
 
617
Matheson Blvd E
 
405
Dixie Rd
 
380
Hurontario St
 
328
Dundas St W
 
306
Other values (599)
12789 

Length

Max length26
Median length21
Mean length11.932277
Min length3

Characters and Unicode

Total characters176896
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique139 ?
Unique (%)0.9%

Sample

1st rowLewisham Dr
2nd rowLaird Rd
3rd rowLaird Rd
4th rowLaird Rd
5th rowLaird Rd

Common Values

ValueCountFrequency (%)
Dundas St E 617
 
4.2%
Matheson Blvd E 405
 
2.7%
Dixie Rd 380
 
2.6%
Hurontario St 328
 
2.2%
Dundas St W 306
 
2.1%
City Centre Dr 294
 
2.0%
Lakeshore Rd E 291
 
2.0%
Britannia Rd E 276
 
1.9%
Tomken Rd 271
 
1.8%
Argentia Rd 261
 
1.8%
Other values (594) 11396
76.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
rd 5431
 
15.4%
dr 3459
 
9.8%
e 2305
 
6.5%
st 1832
 
5.2%
blvd 1541
 
4.4%
w 1354
 
3.8%
dundas 927
 
2.6%
ave 748
 
2.1%
matheson 505
 
1.4%
pky 492
 
1.4%
Other values (615) 16650
47.2%

Most occurring characters

ValueCountFrequency (%)
20419
 
11.5%
r 14698
 
8.3%
e 13581
 
7.7%
a 11177
 
6.3%
d 10648
 
6.0%
n 9424
 
5.3%
t 9016
 
5.1%
i 8465
 
4.8%
o 6875
 
3.9%
l 6159
 
3.5%
Other values (43) 66434
37.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 120862
68.3%
Uppercase Letter 35517
 
20.1%
Space Separator 20419
 
11.5%
Dash Punctuation 87
 
< 0.1%
Other Punctuation 11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 14698
12.2%
e 13581
11.2%
a 11177
9.2%
d 10648
8.8%
n 9424
 
7.8%
t 9016
 
7.5%
i 8465
 
7.0%
o 6875
 
5.7%
l 6159
 
5.1%
s 5265
 
4.4%
Other values (15) 25554
21.1%
Uppercase Letter
ValueCountFrequency (%)
R 6026
17.0%
D 5629
15.8%
S 3502
9.9%
E 3123
8.8%
B 2767
7.8%
C 2523
7.1%
W 2209
 
6.2%
M 1806
 
5.1%
A 1792
 
5.0%
T 1265
 
3.6%
Other values (14) 4875
13.7%
Other Punctuation
ValueCountFrequency (%)
' 10
90.9%
. 1
 
9.1%
Space Separator
ValueCountFrequency (%)
20419
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 87
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 156379
88.4%
Common 20517
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 14698
 
9.4%
e 13581
 
8.7%
a 11177
 
7.1%
d 10648
 
6.8%
n 9424
 
6.0%
t 9016
 
5.8%
i 8465
 
5.4%
o 6875
 
4.4%
l 6159
 
3.9%
R 6026
 
3.9%
Other values (39) 60310
38.6%
Common
ValueCountFrequency (%)
20419
99.5%
- 87
 
0.4%
' 10
 
< 0.1%
. 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 176896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20419
 
11.5%
r 14698
 
8.3%
e 13581
 
7.7%
a 11177
 
6.3%
d 10648
 
6.0%
n 9424
 
5.3%
t 9016
 
5.1%
i 8465
 
4.8%
o 6875
 
3.9%
l 6159
 
3.5%
Other values (43) 66434
37.6%

Address
Categorical

Distinct5587
Distinct (%)37.7%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
100 City Centre Dr
 
201
7205 Goreway Dr
 
98
5100 Erin Mills Pky
 
96
1250 South Service Rd
 
66
1550 South Gateway Rd
 
58
Other values (5582)
14306 

Length

Max length32
Median length27
Mean length16.6143
Min length5

Characters and Unicode

Total characters246307
Distinct characters63
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3652 ?
Unique (%)24.6%

Sample

1st row1445 Lewisham Dr
2nd row3505 Laird Rd
3rd row3505 Laird Rd
4th row3505 Laird Rd
5th row3505 Laird Rd

Common Values

ValueCountFrequency (%)
100 City Centre Dr 201
 
1.4%
7205 Goreway Dr 98
 
0.7%
5100 Erin Mills Pky 96
 
0.6%
1250 South Service Rd 66
 
0.4%
1550 South Gateway Rd 58
 
0.4%
50 Burnhamthorpe Rd W 44
 
0.3%
4141 Dixie Rd 44
 
0.3%
2355 Derry Rd E 44
 
0.3%
2225 Erin Mills Pky 43
 
0.3%
2425 Matheson Blvd E 40
 
0.3%
Other values (5577) 14091
95.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
rd 5430
 
10.8%
dr 3458
 
6.9%
e 2304
 
4.6%
st 1832
 
3.7%
blvd 1541
 
3.1%
w 1355
 
2.7%
dundas 927
 
1.9%
ave 749
 
1.5%
matheson 505
 
1.0%
pky 492
 
1.0%
Other values (3412) 31478
62.9%

Most occurring characters

ValueCountFrequency (%)
35246
 
14.3%
r 14695
 
6.0%
e 13581
 
5.5%
a 11180
 
4.5%
d 10645
 
4.3%
0 9797
 
4.0%
n 9426
 
3.8%
5 9109
 
3.7%
t 9017
 
3.7%
i 8466
 
3.4%
Other values (53) 115145
46.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 120855
49.1%
Decimal Number 54576
22.2%
Uppercase Letter 35532
 
14.4%
Space Separator 35246
 
14.3%
Dash Punctuation 87
 
< 0.1%
Other Punctuation 11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 14695
12.2%
e 13581
11.2%
a 11180
9.3%
d 10645
8.8%
n 9426
 
7.8%
t 9017
 
7.5%
i 8466
 
7.0%
o 6875
 
5.7%
l 6161
 
5.1%
s 5264
 
4.4%
Other values (15) 25545
21.1%
Uppercase Letter
ValueCountFrequency (%)
R 6028
17.0%
D 5631
15.8%
S 3502
9.9%
E 3123
8.8%
B 2770
7.8%
C 2529
7.1%
W 2210
 
6.2%
M 1806
 
5.1%
A 1793
 
5.0%
T 1265
 
3.6%
Other values (14) 4875
13.7%
Decimal Number
ValueCountFrequency (%)
0 9797
18.0%
5 9109
16.7%
1 7837
14.4%
2 5976
10.9%
3 4771
8.7%
6 4396
8.1%
7 3938
7.2%
4 3294
 
6.0%
9 2761
 
5.1%
8 2697
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 10
90.9%
. 1
 
9.1%
Space Separator
ValueCountFrequency (%)
35246
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 87
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 156387
63.5%
Common 89920
36.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 14695
 
9.4%
e 13581
 
8.7%
a 11180
 
7.1%
d 10645
 
6.8%
n 9426
 
6.0%
t 9017
 
5.8%
i 8466
 
5.4%
o 6875
 
4.4%
l 6161
 
3.9%
R 6028
 
3.9%
Other values (39) 60313
38.6%
Common
ValueCountFrequency (%)
35246
39.2%
0 9797
 
10.9%
5 9109
 
10.1%
1 7837
 
8.7%
2 5976
 
6.6%
3 4771
 
5.3%
6 4396
 
4.9%
7 3938
 
4.4%
4 3294
 
3.7%
9 2761
 
3.1%
Other values (4) 2795
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 246307
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
35246
 
14.3%
r 14695
 
6.0%
e 13581
 
5.5%
a 11180
 
4.5%
d 10645
 
4.3%
0 9797
 
4.0%
n 9426
 
3.8%
5 9109
 
3.7%
t 9017
 
3.7%
i 8466
 
3.4%
Other values (53) 115145
46.7%

PostalCode
Categorical

Distinct2689
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
L5B 2C9
 
201
L4T 2T9
 
98
L5M 4Z5
 
96
L5P 1B2
 
71
L5C 1V8
 
70
Other values (2684)
14289 

Length

Max length15
Median length7
Mean length6.9997302
Min length6

Characters and Unicode

Total characters103771
Distinct characters38
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique883 ?
Unique (%)6.0%

Sample

1st rowL5J 3R2
2nd rowL5L 5Y7
3rd rowL5L 5Y7
4th rowL5L 5Y7
5th rowL5L 5Y7

Common Values

ValueCountFrequency (%)
L5B 2C9 201
 
1.4%
L4T 2T9 98
 
0.7%
L5M 4Z5 96
 
0.6%
L5P 1B2 71
 
0.5%
L5C 1V8 70
 
0.5%
L5E 1V4 66
 
0.4%
L4W 5G6 58
 
0.4%
L5J 1K5 54
 
0.4%
L4X 1L4 53
 
0.4%
L4Y 1Y6 47
 
0.3%
Other values (2679) 14011
94.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
l4w 2371
 
8.0%
l5t 1606
 
5.4%
l5n 1164
 
3.9%
l4z 943
 
3.2%
l5l 872
 
2.9%
l5b 856
 
2.9%
l5s 848
 
2.9%
l5m 684
 
2.3%
l4t 676
 
2.3%
l5a 579
 
2.0%
Other values (1027) 19032
64.2%

Most occurring characters

ValueCountFrequency (%)
L 16432
15.8%
14805
14.3%
5 12036
11.6%
4 9038
 
8.7%
1 7411
 
7.1%
2 4954
 
4.8%
3 3117
 
3.0%
W 3097
 
3.0%
T 2872
 
2.8%
6 2172
 
2.1%
Other values (28) 27837
26.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 44480
42.9%
Decimal Number 44479
42.9%
Space Separator 14805
 
14.3%
Lowercase Letter 5
 
< 0.1%
Control 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 16432
36.9%
W 3097
 
7.0%
T 2872
 
6.5%
N 1815
 
4.1%
A 1748
 
3.9%
B 1695
 
3.8%
Z 1599
 
3.6%
C 1547
 
3.5%
V 1472
 
3.3%
M 1415
 
3.2%
Other values (12) 10788
24.3%
Decimal Number
ValueCountFrequency (%)
5 12036
27.1%
4 9038
20.3%
1 7411
16.7%
2 4954
11.1%
3 3117
 
7.0%
6 2172
 
4.9%
8 1837
 
4.1%
9 1769
 
4.0%
7 1556
 
3.5%
0 589
 
1.3%
Lowercase Letter
ValueCountFrequency (%)
k 2
40.0%
g 1
20.0%
c 1
20.0%
l 1
20.0%
Space Separator
ValueCountFrequency (%)
14805
100.0%
Control
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 59286
57.1%
Latin 44485
42.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 16432
36.9%
W 3097
 
7.0%
T 2872
 
6.5%
N 1815
 
4.1%
A 1748
 
3.9%
B 1695
 
3.8%
Z 1599
 
3.6%
C 1547
 
3.5%
V 1472
 
3.3%
M 1415
 
3.2%
Other values (16) 10793
24.3%
Common
ValueCountFrequency (%)
14805
25.0%
5 12036
20.3%
4 9038
15.2%
1 7411
12.5%
2 4954
 
8.4%
3 3117
 
5.3%
6 2172
 
3.7%
8 1837
 
3.1%
9 1769
 
3.0%
7 1556
 
2.6%
Other values (2) 591
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 103771
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 16432
15.8%
14805
14.3%
5 12036
11.6%
4 9038
 
8.7%
1 7411
 
7.1%
2 4954
 
4.8%
3 3117
 
3.0%
W 3097
 
3.0%
T 2872
 
2.8%
6 2172
 
2.1%
Other values (28) 27837
26.8%

BldgNo
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct61
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
14080 
Bldg 1
 
158
Bldg 2
 
157
Bldg A
 
66
Bldg B
 
58
Other values (56)
 
306

Length

Max length15
Median length1
Mean length1.2584823
Min length1

Characters and Unicode

Total characters18657
Distinct characters48
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
14080
95.0%
Bldg 1 158
 
1.1%
Bldg 2 157
 
1.1%
Bldg A 66
 
0.4%
Bldg B 58
 
0.4%
Bldg 3 53
 
0.4%
Bldg 4 38
 
0.3%
Bldg K 23
 
0.2%
Bldg C 19
 
0.1%
East Tower 11
 
0.1%
Other values (51) 162
 
1.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
bldg 653
44.3%
1 172
 
11.7%
2 169
 
11.5%
a 74
 
5.0%
b 64
 
4.3%
3 57
 
3.9%
4 48
 
3.3%
plaza 26
 
1.8%
k 23
 
1.6%
c 20
 
1.4%
Other values (35) 168
 
11.4%

Most occurring characters

ValueCountFrequency (%)
14810
79.4%
B 722
 
3.9%
l 695
 
3.7%
g 673
 
3.6%
d 658
 
3.5%
1 197
 
1.1%
2 174
 
0.9%
a 99
 
0.5%
A 76
 
0.4%
3 57
 
0.3%
Other values (38) 496
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator 14810
79.4%
Lowercase Letter 2353
 
12.6%
Uppercase Letter 973
 
5.2%
Decimal Number 518
 
2.8%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 722
74.2%
A 76
 
7.8%
P 26
 
2.7%
K 23
 
2.4%
C 22
 
2.3%
E 21
 
2.2%
T 18
 
1.8%
H 18
 
1.8%
D 10
 
1.0%
F 8
 
0.8%
Other values (9) 29
 
3.0%
Lowercase Letter
ValueCountFrequency (%)
l 695
29.5%
g 673
28.6%
d 658
28.0%
a 99
 
4.2%
r 42
 
1.8%
e 41
 
1.7%
z 26
 
1.1%
t 24
 
1.0%
o 24
 
1.0%
s 20
 
0.8%
Other values (7) 51
 
2.2%
Decimal Number
ValueCountFrequency (%)
1 197
38.0%
2 174
33.6%
3 57
 
11.0%
4 48
 
9.3%
9 10
 
1.9%
6 9
 
1.7%
5 7
 
1.4%
0 7
 
1.4%
7 5
 
1.0%
8 4
 
0.8%
Space Separator
ValueCountFrequency (%)
14810
100.0%
Other Punctuation
ValueCountFrequency (%)
& 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15331
82.2%
Latin 3326
 
17.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 722
21.7%
l 695
20.9%
g 673
20.2%
d 658
19.8%
a 99
 
3.0%
A 76
 
2.3%
r 42
 
1.3%
e 41
 
1.2%
z 26
 
0.8%
P 26
 
0.8%
Other values (26) 268
 
8.1%
Common
ValueCountFrequency (%)
14810
96.6%
1 197
 
1.3%
2 174
 
1.1%
3 57
 
0.4%
4 48
 
0.3%
9 10
 
0.1%
6 9
 
0.1%
5 7
 
< 0.1%
0 7
 
< 0.1%
7 5
 
< 0.1%
Other values (2) 7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18657
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14810
79.4%
B 722
 
3.9%
l 695
 
3.7%
g 673
 
3.6%
d 658
 
3.5%
1 197
 
1.1%
2 174
 
0.9%
a 99
 
0.5%
A 76
 
0.4%
3 57
 
0.3%
Other values (38) 496
 
2.7%

UnitNo
Categorical

Distinct1602
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
4942 
1
 
519
2
 
413
3
 
360
4
 
351
Other values (1597)
8240 

Length

Max length37
Median length1
Mean length2.14914
Min length1

Characters and Unicode

Total characters31861
Distinct characters65
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1030 ?
Unique (%)6.9%

Sample

1st row
2nd row14&15
3rd row3&4
4th row2
5th row18

Common Values

ValueCountFrequency (%)
4942
33.3%
1 519
 
3.5%
2 413
 
2.8%
3 360
 
2.4%
4 351
 
2.4%
5 312
 
2.1%
6 280
 
1.9%
7 240
 
1.6%
8 214
 
1.4%
9 179
 
1.2%
Other values (1592) 7015
47.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1 607
 
5.4%
2 445
 
3.9%
3 419
 
3.7%
4 395
 
3.5%
5 367
 
3.2%
to 339
 
3.0%
6 323
 
2.9%
7 322
 
2.8%
8 291
 
2.6%
228
 
2.0%
Other values (1254) 7583
67.0%

Most occurring characters

ValueCountFrequency (%)
6377
20.0%
1 5271
16.5%
2 3431
10.8%
0 3366
10.6%
3 1837
 
5.8%
4 1560
 
4.9%
5 1315
 
4.1%
6 1097
 
3.4%
& 993
 
3.1%
7 916
 
2.9%
Other values (55) 5698
17.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20303
63.7%
Space Separator 6377
 
20.0%
Lowercase Letter 1956
 
6.1%
Uppercase Letter 1866
 
5.9%
Other Punctuation 1137
 
3.6%
Dash Punctuation 201
 
0.6%
Close Punctuation 10
 
< 0.1%
Open Punctuation 10
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 531
28.5%
B 424
22.7%
C 175
 
9.4%
F 133
 
7.1%
D 99
 
5.3%
E 98
 
5.3%
L 60
 
3.2%
G 59
 
3.2%
H 57
 
3.1%
R 33
 
1.8%
Other values (14) 197
 
10.6%
Lowercase Letter
ValueCountFrequency (%)
o 571
29.2%
t 465
23.8%
r 156
 
8.0%
l 131
 
6.7%
e 120
 
6.1%
s 76
 
3.9%
n 61
 
3.1%
i 56
 
2.9%
h 51
 
2.6%
a 46
 
2.4%
Other values (12) 223
 
11.4%
Decimal Number
ValueCountFrequency (%)
1 5271
26.0%
2 3431
16.9%
0 3366
16.6%
3 1837
 
9.0%
4 1560
 
7.7%
5 1315
 
6.5%
6 1097
 
5.4%
7 916
 
4.5%
8 826
 
4.1%
9 684
 
3.4%
Other Punctuation
ValueCountFrequency (%)
& 993
87.3%
, 140
 
12.3%
/ 3
 
0.3%
. 1
 
0.1%
Space Separator
ValueCountFrequency (%)
6377
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 201
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28039
88.0%
Latin 3822
 
12.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 571
14.9%
A 531
13.9%
t 465
12.2%
B 424
11.1%
C 175
 
4.6%
r 156
 
4.1%
F 133
 
3.5%
l 131
 
3.4%
e 120
 
3.1%
D 99
 
2.6%
Other values (36) 1017
26.6%
Common
ValueCountFrequency (%)
6377
22.7%
1 5271
18.8%
2 3431
12.2%
0 3366
12.0%
3 1837
 
6.6%
4 1560
 
5.6%
5 1315
 
4.7%
6 1097
 
3.9%
& 993
 
3.5%
7 916
 
3.3%
Other values (9) 1876
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31861
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6377
20.0%
1 5271
16.5%
2 3431
10.8%
0 3366
10.6%
3 1837
 
5.8%
4 1560
 
4.9%
5 1315
 
4.1%
6 1097
 
3.4%
& 993
 
3.1%
7 916
 
2.9%
Other values (55) 5698
17.9%

Modified
Categorical

Distinct189
Distinct (%)1.3%
Missing10
Missing (%)0.1%
Memory size115.9 KiB
2018/12/30 00:00:00+00
2771 
2019/12/12 00:00:00+00
1848 
2019/09/19 00:00:00+00
1586 
2017/11/09 00:00:00+00
1111 
2017/11/08 00:00:00+00
968 
Other values (184)
6531 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters325930
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)0.3%

Sample

1st row2021/06/25 00:00:00+00
2nd row2021/06/03 00:00:00+00
3rd row2021/07/15 00:00:00+00
4th row2021/07/15 00:00:00+00
5th row2021/07/15 00:00:00+00

Common Values

ValueCountFrequency (%)
2018/12/30 00:00:00+00 2771
18.7%
2019/12/12 00:00:00+00 1848
 
12.5%
2019/09/19 00:00:00+00 1586
 
10.7%
2017/11/09 00:00:00+00 1111
 
7.5%
2017/11/08 00:00:00+00 968
 
6.5%
2021/07/02 00:00:00+00 354
 
2.4%
2019/06/07 00:00:00+00 267
 
1.8%
2021/05/21 00:00:00+00 186
 
1.3%
2018/09/30 00:00:00+00 177
 
1.2%
2021/05/17 00:00:00+00 168
 
1.1%
Other values (179) 5379
36.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00+00 14815
50.0%
2018/12/30 2771
 
9.4%
2019/12/12 1848
 
6.2%
2019/09/19 1586
 
5.4%
2017/11/09 1111
 
3.7%
2017/11/08 968
 
3.3%
2021/07/02 354
 
1.2%
2019/06/07 267
 
0.9%
2021/05/21 186
 
0.6%
2018/09/30 177
 
0.6%
Other values (180) 5547
 
18.7%

Most occurring characters

ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 237040
72.7%
Other Punctuation 59260
 
18.2%
Space Separator 14815
 
4.5%
Math Symbol 14815
 
4.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 148805
62.8%
1 29895
 
12.6%
2 29181
 
12.3%
9 8963
 
3.8%
7 6006
 
2.5%
8 5090
 
2.1%
3 3797
 
1.6%
6 2508
 
1.1%
5 2286
 
1.0%
4 509
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 29630
50.0%
: 29630
50.0%
Space Separator
ValueCountFrequency (%)
14815
100.0%
Math Symbol
ValueCountFrequency (%)
+ 14815
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 325930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 325930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

PIN
Real number (ℝ)

Distinct4401
Distinct (%)29.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11578612
Minimum32500
Maximum32656400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum32500
5-th percentile1879320
Q15174400
median10177400
Q314791600
95-th percentile31096500
Maximum32656400
Range32623900
Interquartile range (IQR)9617200

Descriptive statistics

Standard deviation8034094.9
Coefficient of variation (CV)0.69387374
Kurtosis0.46781438
Mean11578612
Median Absolute Deviation (MAD)4699200
Skewness1.0392563
Sum1.7165292 × 1011
Variance6.454668 × 1013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6068300 201
 
1.4%
31141506 121
 
0.8%
4407700 101
 
0.7%
9663800 100
 
0.7%
12876900 66
 
0.4%
24265600 63
 
0.4%
14804200 58
 
0.4%
31381800 55
 
0.4%
17704200 53
 
0.4%
10248600 45
 
0.3%
Other values (4391) 13962
94.2%
ValueCountFrequency (%)
32500 1
 
< 0.1%
37200 4
 
< 0.1%
37400 11
0.1%
38300 2
 
< 0.1%
38400 4
 
< 0.1%
38600 4
 
< 0.1%
38800 1
 
< 0.1%
38900 2
 
< 0.1%
39300 1
 
< 0.1%
39800 1
 
< 0.1%
ValueCountFrequency (%)
32656400 1
 
< 0.1%
32646400 44
0.3%
32551400 1
 
< 0.1%
32526400 2
 
< 0.1%
32476400 11
 
0.1%
32442000 5
 
< 0.1%
32441600 2
 
< 0.1%
32436400 25
0.2%
32431500 43
0.3%
32371800 1
 
< 0.1%

CHArea
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct56
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
Northeast EA (West)
4195 
Gateway EA (East)
912 
Dixie EA
911 
Meadowvale Business Park CC
871 
Western Business Park EA
753 
Other values (51)
7183 

Length

Max length27
Median length23
Mean length16.492951
Min length7

Characters and Unicode

Total characters244508
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowClarkson - Lorne Park NHD
2nd rowWestern Business Park EA
3rd rowWestern Business Park EA
4th rowWestern Business Park EA
5th rowWestern Business Park EA

Common Values

ValueCountFrequency (%)
Northeast EA (West) 4195
28.3%
Gateway EA (East) 912
 
6.2%
Dixie EA 911
 
6.1%
Meadowvale Business Park CC 871
 
5.9%
Western Business Park EA 753
 
5.1%
DT Core 700
 
4.7%
Airport CC 469
 
3.2%
Northeast EA (East) 380
 
2.6%
Mavis-Erindale EA 369
 
2.5%
DT Cooksville 351
 
2.4%
Other values (46) 4914
33.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
ea 7951
19.4%
northeast 4575
 
11.2%
west 4502
 
11.0%
nhd 2650
 
6.5%
park 1816
 
4.4%
east 1734
 
4.2%
business 1624
 
4.0%
cc 1601
 
3.9%
gateway 1337
 
3.3%
dt 1203
 
2.9%
Other values (45) 12033
29.3%

Most occurring characters

ValueCountFrequency (%)
26201
 
10.7%
e 22092
 
9.0%
t 21076
 
8.6%
s 18779
 
7.7%
a 16340
 
6.7%
r 12757
 
5.2%
o 11344
 
4.6%
E 10560
 
4.3%
A 9038
 
3.7%
i 8691
 
3.6%
Other values (34) 87630
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 146770
60.0%
Uppercase Letter 58969
24.1%
Space Separator 26201
 
10.7%
Close Punctuation 5968
 
2.4%
Open Punctuation 5968
 
2.4%
Dash Punctuation 632
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 22092
15.1%
t 21076
14.4%
s 18779
12.8%
a 16340
11.1%
r 12757
8.7%
o 11344
7.7%
i 8691
 
5.9%
l 5994
 
4.1%
h 5421
 
3.7%
n 5051
 
3.4%
Other values (12) 19225
13.1%
Uppercase Letter
ValueCountFrequency (%)
E 10560
17.9%
A 9038
15.3%
N 8505
14.4%
C 6728
11.4%
W 5279
9.0%
D 4764
8.1%
H 2911
 
4.9%
M 2670
 
4.5%
P 2274
 
3.9%
B 1624
 
2.8%
Other values (8) 4616
7.8%
Space Separator
ValueCountFrequency (%)
26201
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5968
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5968
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 632
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 205739
84.1%
Common 38769
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 22092
 
10.7%
t 21076
 
10.2%
s 18779
 
9.1%
a 16340
 
7.9%
r 12757
 
6.2%
o 11344
 
5.5%
E 10560
 
5.1%
A 9038
 
4.4%
i 8691
 
4.2%
N 8505
 
4.1%
Other values (30) 66557
32.4%
Common
ValueCountFrequency (%)
26201
67.6%
) 5968
 
15.4%
( 5968
 
15.4%
- 632
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 244508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
26201
 
10.7%
e 22092
 
9.0%
t 21076
 
8.6%
s 18779
 
7.7%
a 16340
 
6.7%
r 12757
 
5.2%
o 11344
 
4.6%
E 10560
 
4.3%
A 9038
 
3.7%
i 8691
 
3.6%
Other values (34) 87630
35.8%

Ward
Real number (ℝ)

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.3857673
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q15
median5
Q37
95-th percentile11
Maximum11
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.4612868
Coefficient of variation (CV)0.45699836
Kurtosis0.06594084
Mean5.3857673
Median Absolute Deviation (MAD)1
Skewness0.37329748
Sum79844
Variance6.0579329
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
5 6575
44.4%
1 1250
 
8.4%
8 1116
 
7.5%
7 986
 
6.7%
3 957
 
6.5%
9 881
 
5.9%
4 827
 
5.6%
11 823
 
5.6%
6 679
 
4.6%
2 582
 
3.9%
ValueCountFrequency (%)
1 1250
 
8.4%
2 582
 
3.9%
3 957
 
6.5%
4 827
 
5.6%
5 6575
44.4%
6 679
 
4.6%
7 986
 
6.7%
8 1116
 
7.5%
9 881
 
5.9%
10 149
 
1.0%
ValueCountFrequency (%)
11 823
 
5.6%
10 149
 
1.0%
9 881
 
5.9%
8 1116
 
7.5%
7 986
 
6.7%
6 679
 
4.6%
5 6575
44.4%
4 827
 
5.6%
3 957
 
6.5%
2 582
 
3.9%

BIA_NAME
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
13414 
CK
 
443
MLT
 
362
PC
 
304
STR
 
215

Length

Max length3
Median length1
Mean length1.1399663
Min length1

Characters and Unicode

Total characters16900
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
13414
90.5%
CK 443
 
3.0%
MLT 362
 
2.4%
PC 304
 
2.1%
STR 215
 
1.5%
CLV 87
 
0.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
ck 443
31.4%
mlt 362
25.7%
pc 304
21.5%
str 215
15.2%
clv 87
 
6.2%

Most occurring characters

ValueCountFrequency (%)
13414
79.4%
C 834
 
4.9%
T 577
 
3.4%
L 449
 
2.7%
K 443
 
2.6%
M 362
 
2.1%
P 304
 
1.8%
S 215
 
1.3%
R 215
 
1.3%
V 87
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Space Separator 13414
79.4%
Uppercase Letter 3486
 
20.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 834
23.9%
T 577
16.6%
L 449
12.9%
K 443
12.7%
M 362
10.4%
P 304
 
8.7%
S 215
 
6.2%
R 215
 
6.2%
V 87
 
2.5%
Space Separator
ValueCountFrequency (%)
13414
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13414
79.4%
Latin 3486
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 834
23.9%
T 577
16.6%
L 449
12.9%
K 443
12.7%
M 362
10.4%
P 304
 
8.7%
S 215
 
6.2%
R 215
 
6.2%
V 87
 
2.5%
Common
ValueCountFrequency (%)
13414
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13414
79.4%
C 834
 
4.9%
T 577
 
3.4%
L 449
 
2.7%
K 443
 
2.6%
M 362
 
2.1%
P 304
 
1.8%
S 215
 
1.3%
R 215
 
1.3%
V 87
 
0.5%

BIAFulName
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size115.9 KiB
13414 
Cooksville BIA
 
443
Malton BIA
 
362
Port Credit BIA
 
304
Streetsville BIA
 
215

Length

Max length16
Median length1
Mean length2.177403
Min length1

Characters and Unicode

Total characters32280
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
13414
90.5%
Cooksville BIA 443
 
3.0%
Malton BIA 362
 
2.4%
Port Credit BIA 304
 
2.1%
Streetsville BIA 215
 
1.5%
Clarkson BIA 87
 
0.6%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
bia 1411
45.1%
cooksville 443
 
14.2%
malton 362
 
11.6%
port 304
 
9.7%
credit 304
 
9.7%
streetsville 215
 
6.9%
clarkson 87
 
2.8%

Most occurring characters

ValueCountFrequency (%)
15129
46.9%
l 1765
 
5.5%
o 1639
 
5.1%
A 1411
 
4.4%
B 1411
 
4.4%
I 1411
 
4.4%
t 1400
 
4.3%
e 1392
 
4.3%
i 962
 
3.0%
r 910
 
2.8%
Other values (10) 4850
 
15.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 15129
46.9%
Lowercase Letter 11203
34.7%
Uppercase Letter 5948
 
18.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 1765
15.8%
o 1639
14.6%
t 1400
12.5%
e 1392
12.4%
i 962
8.6%
r 910
8.1%
s 745
6.7%
v 658
 
5.9%
k 530
 
4.7%
a 449
 
4.0%
Other values (2) 753
6.7%
Uppercase Letter
ValueCountFrequency (%)
A 1411
23.7%
B 1411
23.7%
I 1411
23.7%
C 834
14.0%
M 362
 
6.1%
P 304
 
5.1%
S 215
 
3.6%
Space Separator
ValueCountFrequency (%)
15129
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17151
53.1%
Common 15129
46.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 1765
10.3%
o 1639
9.6%
A 1411
 
8.2%
B 1411
 
8.2%
I 1411
 
8.2%
t 1400
 
8.2%
e 1392
 
8.1%
i 962
 
5.6%
r 910
 
5.3%
C 834
 
4.9%
Other values (9) 4016
23.4%
Common
ValueCountFrequency (%)
15129
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32280
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15129
46.9%
l 1765
 
5.5%
o 1639
 
5.1%
A 1411
 
4.4%
B 1411
 
4.4%
I 1411
 
4.4%
t 1400
 
4.3%
e 1392
 
4.3%
i 962
 
3.0%
r 910
 
2.8%
Other values (10) 4850
 
15.0%

X
Real number (ℝ)

Distinct4401
Distinct (%)29.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean608574.17
Minimum596636.32
Maximum617060.11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum596636.32
5-th percentile601400.51
Q1606510.1
median608810.72
Q3611114.99
95-th percentile614657.5
Maximum617060.11
Range20423.788
Interquartile range (IQR)4604.8848

Descriptive statistics

Standard deviation3796.9463
Coefficient of variation (CV)0.0062390855
Kurtosis0.023282621
Mean608574.17
Median Absolute Deviation (MAD)2304.2647
Skewness-0.41253221
Sum9.022112 × 109
Variance14416801
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
609566.1112 201
 
1.4%
607701.737 121
 
0.8%
604057.4854 101
 
0.7%
609718.3353 100
 
0.7%
615498.4771 66
 
0.4%
608544.3664 63
 
0.4%
611821.5063 58
 
0.4%
600127.9471 55
 
0.4%
606588.0826 53
 
0.4%
606860.5153 45
 
0.3%
Other values (4391) 13962
94.2%
ValueCountFrequency (%)
596636.3174 1
 
< 0.1%
596761.7476 1
 
< 0.1%
597263.154 1
 
< 0.1%
597730.9671 23
0.2%
597763.1149 2
 
< 0.1%
597816.374 2
 
< 0.1%
597920.1761 20
0.1%
597939.1033 9
 
0.1%
598058.8187 1
 
< 0.1%
598115.7963 14
0.1%
ValueCountFrequency (%)
617060.1055 1
< 0.1%
616918.4738 1
< 0.1%
616839.6893 1
< 0.1%
616837.5953 1
< 0.1%
616769.3441 1
< 0.1%
616704.5391 1
< 0.1%
616692.2284 1
< 0.1%
616667.6043 1
< 0.1%
616657.8816 1
< 0.1%
616643.3766 1
< 0.1%

Y
Real number (ℝ)

Distinct4401
Distinct (%)29.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4829936.5
Minimum4815549.4
Maximum4843106.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size115.9 KiB

Quantile statistics

Minimum4815549.4
5-th percentile4819734.2
Q14826194.8
median4829859.2
Q34834052.3
95-th percentile4839492.4
Maximum4843106.9
Range27557.528
Interquartile range (IQR)7857.4572

Descriptive statistics

Standard deviation5679.5754
Coefficient of variation (CV)0.001175911
Kurtosis-0.58402043
Mean4829936.5
Median Absolute Deviation (MAD)3910.0007
Skewness-0.065617821
Sum7.1603809 × 1010
Variance32257576
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4827535.97 201
 
1.4%
4838234.833 121
 
0.8%
4823601.861 101
 
0.7%
4841653.08 100
 
0.7%
4827677.175 66
 
0.4%
4840490.34 63
 
0.4%
4832002.583 58
 
0.4%
4826225.623 55
 
0.4%
4836449.15 53
 
0.4%
4834906.53 45
 
0.3%
Other values (4391) 13962
94.2%
ValueCountFrequency (%)
4815549.405 1
< 0.1%
4815601.213 1
< 0.1%
4816100.511 1
< 0.1%
4816303.869 1
< 0.1%
4816361.694 1
< 0.1%
4816457.235 1
< 0.1%
4816654.247 1
< 0.1%
4816757.055 2
< 0.1%
4816794.23 1
< 0.1%
4816814.742 1
< 0.1%
ValueCountFrequency (%)
4843106.933 3
< 0.1%
4843045.912 1
 
< 0.1%
4842995.781 2
< 0.1%
4842852.901 1
 
< 0.1%
4842722.486 1
 
< 0.1%
4842531.982 2
< 0.1%
4842304.058 2
< 0.1%
4842274.717 1
 
< 0.1%
4842274.399 2
< 0.1%
4842200.556 2
< 0.1%

Interactions

Correlations

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

FIDIDNameEmplRangeNAICSTitleNAICSDescrNAICSCodePhoneFaxTollFreeEMailWebAddressStreetNoStreetNameAddressPostalCodeBldgNoUnitNoModifiedPINCHAreaWardBIA_NAMEBIAFulNameXY
0194017Garderie La Fontaine De I'Amitie5 to 9Health CareChild Day-Care Services624410905-822-8902contact@lafontaindeamitie.cawww.lafontaindeamitie.ca1445Lewisham Dr1445 Lewisham DrL5J 3R22021/06/25 00:00:00+001505000Clarkson - Lorne Park NHD2609549.25554.819128e+06
1211793KONE Canada Inc.5 to 9ManufacturingOther Metalworking Machinery Manufacturing333519905-820-6034905-820-7189koneservice@kone.comwww.kone.ca3505Laird Rd3505 Laird RdL5L 5Y714&152021/06/03 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
2311796Biesse Canada10 to 19ManufacturingSawmill and woodworking machinery manufacturing333245416-525-9110450-477-04841-800-598-3202matt.fleming@biessecanada.comwww.biessecanada.com3505Laird Rd3505 Laird RdL5L 5Y73&42021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
3411799Trimart Corporation5 to 9FinanceMortgage and Non-mortgage Loan Brokers522310905-820-6711905-820-5669Priority@trimart.cawww.trimart.ca3505Laird Rd3505 Laird RdL5L 5Y722021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
4511800S A W Technology1 to 4WholesaleAll Other Machinery, Equipment and Supplies Wholesaler-Distributors417990905-567-1804mark@sawtechnology.comwww.sawtechnology.com3505Laird Rd3505 Laird RdL5L 5Y7182021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
5617023Jack Elite Industrial Inc.5 to 9WholesaleElectronic Components, Navigational and Communications Equipment and Supplies Wholesaler-Distributors417320905-569-6988905-569-8819info@jackelite.comwww.jackelite.com3200Ridgeway Dr3200 Ridgeway DrL5L 5Y69&102017/11/09 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
6717025Hite Engineering Corp.5 to 9ProfessionalEngineering Services541330905-812-3709Stephanie@hite.ca3200Ridgeway Dr3200 Ridgeway DrL5L 5Y6172021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
7817028Quality Custom Blending Ltd.10 to 19ManufacturingFlour Mixes and Dough Manufacturing from Purchased Flour311822905-569-0067jean@qualitycustomblending.cawww.qualitycustomblending.ca3200Ridgeway Dr3200 Ridgeway DrL5L 5Y64&52021/05/17 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
8917029Ontario Shake N' Tile5 to 9ConstructionRoofing Contractors238160905-828-7663905-828-34451-888-271-7119info@ontarioshakentile.comwww.ontarioshakentile.com3200Ridgeway Dr3200 Ridgeway DrL5L 5Y611&122021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
91017030Custom Engineered Millwork Ltd.10 to 19ManufacturingWood Kitchen Cabinet and Counter Top Manufacturing337110905-828-9555905-828-9776Roxanne@ce-millwork.comwww.ce-millwork.com3200Ridgeway Dr3200 Ridgeway DrL5L 5Y614 to 162021/07/15 00:00:00+0017999600Western Business Park EA8605278.20144.819129e+06
FIDIDNameEmplRangeNAICSTitleNAICSDescrNAICSCodePhoneFaxTollFreeEMailWebAddressStreetNoStreetNameAddressPostalCodeBldgNoUnitNoModifiedPINCHAreaWardBIA_NAMEBIAFulNameXY
148151481657550Advance Car & Truck Rental1 to 4Real EstatePassenger Car Rental532111905-461-7368905-461-66661-877-303-7368Advancerental@gmail.comwww.advancerental.ca2960Drew Rd2960 Drew RdL4T 0A51492021/06/22 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148161481757551Video Palace1 to 4Real EstateAll Other Consumer Goods Rental532280905-678-78782960Drew Rd2960 Drew RdL4T 0A51502021/06/02 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148171481857552Secure Life Insurance Agency Inc.NaNFinanceDirect Group Life, Health and Medical Insurance Carriers5241121-800-746-9122www.securelifeinsurance.ca2960Drew Rd2960 Drew RdL4T 0A51512018/12/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148181481957555Skillman Flooring1 to 4RetailFloor Covering Stores442210905-676-9111905-676-9113skillmanflooring@live.cawww.skillmanflooring.com2960Drew Rd2960 Drew RdL4T 0A5155&157B2019/12/12 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148191482057557Verma Vastar Manufacturing Inc.1 to 4ManufacturingCut and Sew Clothing Contracting315210647-669-45452960Drew Rd2960 Drew RdL4T 0A51602018/12/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148201482160142JobsForU10 to 19AdministrativeEmployment Placement Agencies and Executive Search Services561310416-825-4000navjot@jobsforu.cawww.jobsforu.ca2960Drew Rd2960 Drew RdL4T 0A51562021/07/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148211482260159Elite Source SolutionsNaNAdministrativeEmployment Placement Agencies and Executive Search Services561310905-598-35422980Drew Rd2980 Drew RdL4T 0A71332018/12/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148221482360160Indian Sweet MasterNaNAccommodationFull-service restaurants722511905-405-85852980Drew Rd2980 Drew RdL4T 0A71342018/12/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148231482460161Mississauga Flooring & Supplies Inc.1 to 4WholesaleFloor Covering Wholesaler-Distributors414320905-460-70052980Drew Rd2980 Drew RdL4T 0A7135 & 1362021/08/16 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06
148241482560162Punjabi Textile Ltd.NaNWholesaleClothing and Clothing Accessories Wholesaler-Distributors414110905-405-19192980Drew Rd2980 Drew RdL4T 0A71322018/12/30 00:00:00+0024265600Northeast EA (West)5MLTMalton BIA608544.36644.840490e+06